Advanced Data Aggregation with Pandas Functions

pandas

dataframe

aggregation

functions

Master advanced data aggregation techniques in Pandas using built-in functions, custom functions, and lambda expressions. Learn to create powerful summary statistics and custom aggregations.

Author

Mohammed Adil Siraju

Published

September 21, 2025

Data aggregation is a fundamental operation in data analysis that allows you to summarize and analyze data by groups. This notebook covers:

Built-in Aggregation Functions: Using Pandas’ built-in functions like sum, mean, max
Custom Aggregation Functions: Creating your own aggregation logic with lambda functions and custom functions
Multiple Aggregations: Applying several functions simultaneously
Dictionary-based Aggregation: Specifying different functions for different columns

Mastering these techniques will give you powerful tools for data summarization and analysis.

1. Setting Up Sample Data

Let’s create a sample dataset to demonstrate various aggregation techniques. We’ll work with categorical data and numerical values.

import pandas as pd

data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10,15,20,25,30]
}

df = pd.DataFrame(data)
df

	Category	Value
0	A	10
1	B	15
2	A	20
3	B	25
4	A	30

2. Built-in Aggregation Functions

Pandas provides many built-in aggregation functions that you can use with the agg() method. These are the most common summary statistics.

Sum Aggregation

Calculate the total sum of values for each category:

df.groupby('Category').agg({'Value':'sum'})

	Value
Category
A	60
B	40

Mean Aggregation

Calculate the average value for each category:

df.groupby('Category').agg({'Value':'mean'})

	Value
Category
A	20.0
B	20.0

Maximum Value Aggregation

Find the highest value in each category:

df.groupby('Category').agg({'Value':'max'})

	Value
Category
A	30
B	25

3. Custom Aggregation Functions

Sometimes built-in functions aren’t enough. Pandas allows you to create custom aggregation functions using lambda expressions or named functions.

Lambda Functions for Custom Aggregation

Create a lambda function to calculate the range (max - min) for each category:

custom_agg = lambda x: x.max() - x.min()

df

	Category	Value
0	A	10
1	B	15
2	A	20
3	B	25
4	A	30

df.groupby('Category').agg(custom_agg)
# or
df.groupby('Category').agg({'Value': custom_agg})

	Value
Category
A	20
B	10

4. Multiple Aggregations

You can apply multiple aggregation functions at once to get comprehensive statistics for each group.

Applying Multiple Built-in Functions

Calculate count, sum, min, max, and mean for each category:

df.groupby('Category')['Value'].agg(['count', 'sum', 'min', 'max','mean'])

	count	sum	min	max	mean
Category
A	3	60	10	30	20.0
B	2	40	15	25	20.0

5. Named Custom Functions

For more complex logic, you can define named functions and use them in aggregations.

Creating a Custom Mean Function

Define a function to calculate mean (demonstrating how custom functions work):

def custom_mean(values):
    return sum(values) / len(values)

df.groupby('Category')['Value'].agg(custom_mean)

Category
A    20.0
B    20.0
Name: Value, dtype: float64

Summary

Data aggregation is a powerful tool for summarizing and analyzing grouped data. In this notebook, you learned:

🔧 Built-in Functions

sum, mean, max: Standard statistical aggregations
Dictionary syntax: agg({'column': 'function'})
Multiple functions: agg(['func1', 'func2'])

🎯 Custom Functions

Lambda functions: Quick, inline custom logic
Named functions: Complex logic with reusable functions
Flexible application: Apply to specific columns or entire groups

💡 Key Concepts

Dictionary Aggregation: Specify different functions for different columns
List Aggregation: Apply multiple functions to the same column
Custom Logic: Create domain-specific aggregations

🚀 Best Practices

Use built-in functions when possible (more efficient)
Lambda functions for simple custom logic
Named functions for complex, reusable operations
Choose appropriate aggregations based on your data and analysis goals

📊 Next Steps

Explore groupby with multiple columns
Learn about transformation and filtering operations
Practice with real datasets to create meaningful aggregations

Mastering aggregation functions will significantly enhance your data analysis capabilities! 🎯📈